Assigning Geographical Scopes To Web Pages

نویسندگان

  • Bruno Martins
  • Marcirio Silveira Chaves
  • Mário J. Silva
چکیده

Finding automatic ways of attaching geographical scopes to on-line resources, also called “geo-referencing” documents, is a challenging problem, getting increasing attention [1, 5, 3]. Here we present a system architecture and a process for identifying the geographical scope of Web pages, defining a scope as the region where more people than average would find that page relevant. We rely on typical Web IR heuristics (i.e. feature weighting, hypertext topic locality, anchor description) and assumptions on how people use geographical references in documents. The method involves three major steps. First, geographical named entities are identified in the text. Next, we propagate the found named entities through the Web linkage graph. Finally, a geographical ontology is used to disambiguate among the named entities associated to a document, this way selecting the most likely scope. In the future, we plan on using scopes in new location-aware search tools.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adding geographic scopes to web resources

Many Web pages are rich in geographic information and primarily relevant to geographically limited communities. However, existing IR systems only recently began to offer local services and largely ignore geo-spatial information. This paper presents our work on automatically identifying the geographical scope of Web documents, which provides the means to develop retrieval tools that take the geo...

متن کامل

A Comparison of Different Approaches for Assigning Geographic Scopes to Documents

In this paper, we compare different methods for the automatic assignment of geographic scopes to Web pages, based on placenames mentioned in the text. The methods under study are the Yahoo! Placemaker Web service, the hierarchy-based method originally proposed for the Web-a-Where system, the spatial overlap-based method originally proposed in the GIPSY project, the graph-based method originally...

متن کامل

Estimation of Web Contents Geographic Provenience Exploiting Creative Commons Licensed Pages for Training Set Aggregation

Geographic scope estimation is a fairly recent problem which is gaining increasing attention due to the broad implications in many different fields, ranging from the development of better search engines to the need to assess specific content production on a geographical basis. However, geographic scope is a concept that can be interpreted in many different ways, ranging from the expected target...

متن کامل

The place of place in geographical IR

The most common kind of collections for GIR so far are the Web and other document collections, which are mainly textual. This paper is concerned with the non-trivial relationship between reference to place in natural language (NL) and common GIR assumptions. There are two main ways in which NL texts and GIR meet: in the attempt to derive or populate geo-ontologies from text itself, and in the a...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005